Search CORE

162 research outputs found

Exploratory Mediation Analysis with Many Potential Mediators

Author: Oberski Daniel L.
van Kesteren Erik-Jan
Publication venue: 'Informa UK Limited'
Publication date: 02/03/2019
Field of study

Social and behavioral scientists are increasingly employing technologies such as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional' datasets with more columns than rows. There is increasing interest, but little substantive theory, in the role the variables in these data play in known processes. This necessitates exploratory mediation analysis, for which structural equation modeling is the benchmark method. However, this method cannot perform mediation analysis with more variables than observations. One option is to run a series of univariate mediation models, which incorrectly assumes independence of the mediators. Another option is regularization, but the available implementations may lead to high false positive rates. In this paper, we develop a hybrid approach which uses components of both filter and regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering conditional on the other selected mediators. We show through simulation that it improves performance over existing methods. Finally, we provide an empirical example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at https://github.com/vankesteren/cmfilter and https://github.com/vankesteren/ema_simulation

arXiv.org e-Print Archive

Utrecht University Repository

FigShare

The Expected Parameter Change (EPC) for local dependence assessment in binary data latent class models

Author: Oberski Daniel L.
Vermunt Jeroen K.
Publication venue
Publication date: 01/01/2018
Field of study

Binary data latent class models crucially assume local independence, violations of which can seriously bias the results. We present two tools for monitoring local dependence in binary data latent class models: the "Expected Parameter Change" (EPC) and a generalized EPC, estimating the substantive size and direction of possible local dependencies. The asymptotic and finite sample behavior of the measures is studied, and two applications to the U.S. Census estimation of Hispanic ethnicity and medical experts' ratings of x-rays demonstrate its value in arriving at a model that balances realism and parsimony.Comment: R code implementing our proposal and including both example datasets is available online as supplementary materia

arXiv.org e-Print Archive

Tilburg University Repository

Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Author: Fang Qixiang
Nguyen Dong
Oberski Daniel L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/02/2022
Field of study

Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.Comment: Under revie

arXiv.org e-Print Archive

Utrecht University Repository

Updating latent class imputations with external auxiliary variables

Author: Boeschoten L.
de Waal T.
Oberski Daniel L.
Vermunt J.K.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Tilburg University Repository

Differential privacy and social science: An urgent puzzle

Author: Kreuter Frauke
Oberski Daniel L.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2020
Field of study

Accessing and combining large amounts of data is important for quantitative social scientists, but increasing amounts of data also increase privacy risks. To mitigate these risks, important players in official statistics, academia, and business see a solution in the concept of differential privacy. In this opinion piece, we ask how differential privacy can benefit from social-scientific insights, and, conversely, how differential privacy is likely to transform social science. First, we put differential privacy in the larger context of social science. We argue that the discussion on implementing differential privacy has been clouded by incompatible subjective beliefs about risk, each perspective having merit for different data types. Moreover, we point out existing social-scientific insights that suggest limitations to the premises of differential privacy as a data protection approach. Second, we examine the likely consequences for social science if differential privacy is widely implemented. Clearly, workflows must change, and common social science data collection will become more costly. However, in addition to data protection, differential privacy may bring other positive side effects. These could solve some issues social scientists currently struggle with, such as p-hacking, data peeking, or overfitting; after all, differential privacy is basically a robust method to analyze data. We conclude that, in the discussion around privacy risks and data protection, a large number of disciplines must band together to solve this urgent puzzle of our time, including social science, computer science, ethics, law, and statistics, as well as public and private policy

MAnnheim DOCument Server